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Abstract 


One  of  the  key  challenges  in  surveillance  video  face  screening  against  a  Watch  List  is  the  fact  that 
faces  in  surveillance  video  are  often  observed  at  an  angle  different  from  the  angle  at  which  the 
faces  are  captured  in  the  Watch  List.  Particularly,  facial  images  in  surveillance  video  are  normally 
observed  at  various  pose  angles,  and  from  above  eye  level.  In  contrast,  mugshot  (reference  facial 
images)  stored  in  databases  are  regularly  captured  at  a  frontal  post  and  at  eye  level,  thus  causing 
poor  matching  between  the  images.  One  way  to  overcome  this  problem  is  seen  in  advanced  pre¬ 
processing  of  stored  images.  It  is  possible  to  synthetically  generate  variations  of  a  reference  facial 
images  of  target  individuals  at  under  the  same  conditions  (e.g.  pose  angle)  under  which  they  will 
be  most  likely  observed  in  a  video.  While  several  commercial  tools  exist,  an  open  source  library 
is  available  to  generate  a  3D  face  model  from  arbitrary  2D  facial  images.  This  library,  called  Can- 
dide,  may  allow  academia  and  industry  to  significantly  improve  the  matching  performance  of  their 
algorithms  in  video  surveillance  applications.  This  report  overviews  this  library  and  analyzes  its 
suitability  for  the  problem. 

Keywords:  video-surveillance,  face  recognition  in  video,  instant  face  recognition,  watch-list 
screening,  biometrics,  reliability,  performance  evaluation 
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of  innovative  technologies  for  public  safety  and  security  practitioners  to  achieve  specific 
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1  Introduction 


As  discussed  in  [1],  one  of  the  key  challenges  in  surveillance  video  face  screening  against  a  Watch 
List  is  the  fact  that  faces  in  surveillance  video  are  often  observed  at  an  angle  different  from  the 
angle  at  which  the  faces  are  captured  in  the  Watch  List.  Particularly,  facial  images  in  surveil¬ 
lance  video  are  normally  observed  from  above  eye  level,  whereas  mugshot  facial  images  stored  in 
databases  are  regularly  captured  at  eye  level,  thus  causing  poor  matching  between  the  images.  A 
way  to  overcome  the  problem  is  seen  in  advanced  preprocessing  of  stored  images  to  synthetically 
generate  the  facial  images  of  target  individuals  in  the  poses  under  which  they  will  be  most  likely 
observed  in  video.  While  there  are  several  commercial  tools  to  do  that  such  as  the  ones  developed 
by  CMU  1 ,  Animetrics  2 3  and  ReproFace  used  by  the  FBI  \  there  also  exists  an  open  source  library 
which  allows  one  to  generate  a  3D  face  model  from  arbitrary  2D  facial  images.  This  library,  called 
Candide  mikael,AHL01,  may  allow  academia  and  industry  to  significantly  improve  the  matching 
performance  of  their  algorithms  in  video  surveillance  applications. 

Candide  also  makes  it  possible  to  generate  facial  images  from  partially  visible  faces.  A  possible 
situation  is  the  case  where  the  surveillance  team  has  a  video  sequence  with  the  subject’s  face 
partially  covered.  The  single  video  frame  can  then  be  used  to  generate  new  poses,  by  adjusting 
the  face  mask  to  the  face’s  angle  appearing  in  the  video,  and  Candide  can  generate  new  poses 
from  this  information,  some  mark,  like  a  scar,  Candide  will  not  generate  this  mark  on  the  unseen 
face  part.  These  new  poses  can  be  added  to  the  Machine  Learning  (ML)  algorithm  to  improve  its 
accuracy  to  detect  that  particular  subject. 

Another  point  that  deserves  attention  is  the  fact  that,  if  only  one  picture  per  subject  is  available 
for  the  learning  process,  then  it  is  practically  impossible  for  a  ML  system  to  build  an  accurate 
model  of  the  subject’s  face.  Synthetically  generating  more  facial  images  of  the  same  subject  would 
allow  more  accurate  face  models  to  be  built,  yielding  overall  improvement  in  face  recognition 
performance. 

In  the  following  we  summarize  some  background  information  in  synthetic  face  generation, 
and  describe  how  the  Candide  face  mask  can  be  useful  in  face  recognition  systems,  and  present  a 
discussion  on  future  work. 

JCNN  -  “How  CMU  Biometrics  Center  Face  Recognition  Could  Help  Boston”,  May  1,  2013.  CMU  3D  Fave 
Modeling  research:  http://www.cmu-biometrics.org 

22D-3D  FACEngine  Face  Recognition  Performance  Based  on  SetPose  Geometric  Normalization: 
http://animetrics.com/face-recognition-based-on-setpose-geometric-normalization/.  Making  Faces  ID-Ready  (The 
world’s  leading  2D  to  3D  face  biometric  forensics  tool,  supporting  45  pose  correction,  making  faces  ’’ID-Ready”  for 
any  facial  recognition  system):  https://id.ready.animetrics.com/ 

3  Richard  W.  Vorder  Bruegge,  Facial  Recognition  and  Identification  Initiatives,  Federal  Bu¬ 
reau  of  Investigation,  Biometric  Consortium  Conference  (BCC  2010)  Sept.  2010.  Online: 
http://biometrics.org/bc2010/presentations/DOJ/vorder_bmegge-Facial-Recognition-and-Identification- 
Initiatives.pdf. 
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2  Background  -  Synthetic  Face  Generation 

In  synthetic  face  generation  approaches,  a  2D  face  image  is  typically  mapped  onto  a  deformable 
3D  face  model  which  is  then  used  to  allow  generation  of  synthetic  2D  faces  with  different  angles 
and  poses.  A  3D  face  model  can  be  generated  from  one  or  more  images  automatically  and  then 
can  be  adjusted  by  manually  by  mapping  facial  features  onto  internal  face  model.  Afterwards,  a 
morphable  face  model  can  be  derived  by  transforming  the  shape  and  texture  of  the  3D  face  model 
into  set  of  vectors.  Linear  combination  of  thus  obtained  vector  prototypes  allows  one  to  model 
new  facial  apperances  and  expressions. 

A  combination  of  3D  morphable  models  and  component-based  recognition  has  been  used  for 
building  pose  and  illumination  invariant  FR  systems  [5].  Three  input  faces  of  each  person  are 
employed  in  [5]  to  compute  morphable  3D  face  models,  which  are  then  used  to  build  a  large  set  of 
synthetic  faces  under  different  viewpoints  and  lighting  conditions  for  training  a  component-based 
FR  system.  Initial  database  of  3D  models  was  built  with  a  3D  laser  scanner.  By  morphing  between 
the  existing  models  in  the  database,  pose  ranges  within  a  range  of  +/-  45  degrees  of  rotation  in 
depth  and  +/-  10  degrees  of  rotation  in  the  image  plane  can  be  achieved,  using  two  illumination 
models  for  each  pose. 

Synthetic  face  cubes  extracted  from  original  face  images  in  both  frontal  and  20  degrees  side 
views  are  introduced  in  [7]  based  on  head  shapes  and  feature  location  in  order  to  match  synthetic 
faces.  The  geometric  difference  between  the  faces  in  a  four  dimensional  face  subspace  using  local 
Euclidean  distance  is  used  as  a  metric  in  the  face  space. 

Recently,  a  morphing  procedure  has  been  proposed  to  create  training  set  to  design  a  user- 
specific  face  recognizer  using  combination  of  two  parallel  classifiers,  one  based  on  Gabor  features 
and  the  other  based  on  Local  Binary  Patterns  (LBP)  [8,  1],  In  the  morphing  procedure,  borderline 
faces  are  generated  between  each  target  face  and  random  non-target  faces.  The  morphed  faces  can 
be  similar  to  each  other,  where  the  less  morphed  faces  can  be  considered  as  a  borderline  pattern  of 
positive  training  samples  and  the  deeper  morphed  faces  are  related  to  borderline  pattern  of  negative 
training  samples. 

In  [9],  virtual  samples  are  constructed  from  a  single  face  image  using  a  wavelet  transform. 
First,  a  2D  wavelet  transform  is  applied  to  decompose  a  facial  image  into  four  regions  in  the  fre¬ 
quency  domain.  Then,  virtual  samples  are  generated  by  rotating  the  image  in  different  directions. 
The  Principle  Component  Analysis  (PCA)  is  used  for  classification. 

Wavelet  transform  is  a  time-frequency  scale  transformation  that  is  developed  by  the  Fourier 
transform.  Regions  in  the  frequency  domain  belong  to  one  of  four  regions:  low-frequency  region 
LL  (approximate  component)  and  high  frequency  regions,  LH  (horizontal  component),  HL  (verti¬ 
cal  component),  and  HH  (diagonal  component).  For  example,  in  a  2-level  wavelet  decomposition 
the  second  decomposition  is  computed  in  the  LL1  region.  Each  face  is  then  divided  into  four  faces 
of  the  same  dimension  and  the  quarter  size  of  the  original  image. 
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In  [10]  a  method  called  single  image  subspace  (SIS)  is  proposed  for  single-sample-per-person 
problems  to  represent  each  single  image  as  a  subspace  spanned  by  its  synthesized  images.  Synthe¬ 
sized  samples  are  used  to  generate  subspaces,  which  can  be  constructed  in  three  ways:  1)  from  the 
entire  extended  training  set,  2)  from  all  synthesized  images  of  the  subject,  or  3)  from  all  images 
that  passed  a  common  filter  criteria. 

Several  other  approaches  to  3D  modeling  and  its  use  for  face  recognition  in  video  are  presented 
in  [6], 


3  Candide 

One  of  the  simple,  popular,  and  publicly  available  tools  to  generate  a  3D  face  model  from  arbitrary 
2D  facial  images  is  called  Candide  [2,  3],  created  by  Mikael  Rydfalk  at  Linkoping  University  in 
1987  [2],  Candide  uses  a  parametrized  face  mask  that  is  specifically  developed  for  model-based 
coding  of  human  faces.  It  allows  fast  and  computationally  low  generation  of  synthetic  faces  using 
an  image  of  frontal  face  or  several  images  with  partially  occluded  faces  captured  under  different 
poses  and  angles.  The  constructed  3D  model  is  defined  by  a  triangulated  mesh  and  contains  a  full 
3D  description  of  vertex  locations  of  the  mesh.  Candide  is  controlled  by  global  and  local  Action 
Units  (AUs).  The  global  AUs  correspond  to  the  rotations  around  three  axes.  The  local  AUs  control 
the  mimics  of  the  face  so  that  different  expressions  can  be  obtained.  An  example  of  the  Candide 
wire-frame  face  model  is  shown  in  Figure  1 . 

Having  the  3D  model,  it  is  possible  to  use  standard  Computer  Graphics  texture-mapping  tech¬ 
niques  to  synthesize  as  many  virtual  face  images  at  novel  view  angles  as  necessary. 

There  are  implementations  of  Candide  available  on  the  Linkoping  University  website  for  both 
Windows©  and  Linux®,  but  they  are  outdated.  The  source  code  is  implemented  in  C++,  and 
the  Windows©  version  does  not  compile  because  it  has  some  missing  files.  The  Linux©  version 
compiles  and  generates  the  executable  file,  but  it  demands  the  user  to  reduce  the  video  definition 
to  allow  the  program  to  work.  In  order  to  have  a  functional  program,  it  was  necessary  contact 
the  authors  of  [11],  and  they  have  provided  the  MatLab©  version,  that  is  presented  in  Figure  2. 
The  main  disadvantage  of  this  program  is  the  fact  that  it  is  implemented  in  a  closed  architecture 
tool,  which  prevents  its  integration  with  other  programs.  The  program  in  MatLab  puts  the  Candide 
model  onto  the  face  image.  The  face  alignment  module  adapts  a  3D  generic  face  model  onto  the 
face  image  to  extract  facial  shape  and  texture  information.  From  this  step,  it  is  possible  to  extract 
various  positions  of  the  face  and  generate  different  files  to  extract  the  features  for  learning  step. 

The  idea  to  use  Candide  in  the  PROVE-IT  project  comes  from  [12,  11],  which  adapt  Candide 
face  generation  to  create  new  faces  as  input  to  a  ML  algorithm.  The  solution  presented  in  [12, 
11]  is  very  similar  to  the  problem  examined  in  the  PROVE-IT  project,  which  is  how  to  build  a 
face  recognition  system  from  from  one  image  (generally  frontal  picture)  so  that  it  recognize  the 
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Figure  1:  Candide-3  with  113  vertices  and  168  surfaces. 


same  face  under  different  angles  of  view.  The  approach  in  [12,  11]  is  centered  on  the  still-to- 
still  problem,  but  it  gives  insights  on  how  to  automate  the  process  of  creating  useful  data  sets  for 
still-to-video  applications  examined  in  the  the  PROVE-IT  project. 

The  approach  in  [12,  11]  used  Harris  detector  to  extract  facial  features  and  Hidden  Markov 
Model  (HMM)  as  a  ML  algorithm.  The  architecture  of  their  proposed  system  is  presented  in 
Figure  3.  Our  system  follows  the  same  ideas,  with  the  difference  in  the  used  features  and  ML 
algorithms. 


4  Discussion  and  Future  Work 

The  main  objective  of  this  work  is  to  examine  the  applicability  of  open  source  face  pose  generation 
tools,  such  as  Candide,  for  improving  the  performance  of  face  recognition  ML  algorithms  ML 
algorithms  cannot  generate  reliable  models  from  one  frontal  picture,  which  is  what  is  normally 
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Figure  2:  GUI  of  the  synthesis  module  written  in  Matlab  shows  the  Candide  model  mapped  onto 
a  face  image  and  a  synthesized  face. 


available  in  Wanted  Lists.  For  successful  face  recognition,  it  is  important  from  the  initial  frontal 
picture  to  be  able  to  dynamically  generate  new  facial  positions,  corresponding  to  various  points- 
of-view,  and  make  them  available  to  ML  algorithms,  which  will  use  them  to  generate  better  face 
models.  This  problem  can  be  solved  by  using  the  Candide  face  generation  tool,  which  builds  a  3D 
face  mesh  designed  to  model  facial  pictures  and  which  allows  one  to  generate  new  poses  from  the 
original  image. 

The  main  advantage  of  Candide  is  to  allow  one  to  generate  facial  images  from  any  visible 
position,  even  if  the  face  view  is  partially  blocked.  Another  capability  is  to  allow  the  insertion  of 
face  expressions,  like  a  smile,  on  the  image.  These  new  picture  poses  are  useful  to  a  ML  algorithm, 
which  can  then  leam  different  expressions  and  improve  its  recognition  capabilities. 
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Figure  3:  Illustration  of  HMM  pose  invariant  system. 


Candide  can  be  used  as  part  of  a  larger  face  recognition  to  allow  the  security  personnel  to  use 
the  generated  face  mask  as  part  of  their  regular  work.  Their  job  would  be  to  position  the  mask  on 
the  subject’s  face  and  then  let  the  system  generate  a  sequence  of  images  corresponding  to  different 
perspectives  to  be  used  automatically  by  a  ML  algorithm. 

For  future  work,  it  is  necessary  to  translate  the  Candide’s  MatLab©  implementation  into  a  C++ 
library,  which  would  allow  to  use  the  tool  as  to  build  a  larger  face  recognition  system.  A  standard 
library,  like  OpenCV4,  offers  a  natural  choice,  because  it  is  an  open  source  and  free  software  with 
support  to  ML  and  Image  Processing. 


4OpenCV  (Open  Source  Computer  Vision  Library)  is  an  open  source  computer  vision  and  machine  learning  soft¬ 
ware  library.  Being  a  BSD-licensed  product,  OpenCV  makes  it  easy  for  businesses  to  utilize  and  modify  the  code. 
The  library  has  more  than  2500  optimized  algorithms,  which  includes  a  comprehensive  set  of  both  classic  and  state- 
of-the-art  computer  vision  and  machine  learning  algorithms.  These  algorithms  can  be  used  to  detect  and  recognize 
faces,  identify  objects,  classify  human  actions  in  videos,  track  camera  movements,  track  moving  objects,  extract  3D 
models  of  objects,  produce  3D  point  clouds  from  stereo  cameras,  stitch  images  together  to  produce  a  high  resolution 
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